Corpus-Based Thesaurus Construction for Image Retrieval in Specialist Domains
نویسندگان
چکیده
This paper explores the use of texts that are related to an image collection, also known as collateral texts, for building thesauri in specialist domains to aid in image retrieval. Corpus linguistic and information extraction methods are used for identifying key terms and semantic relationships in specialist texts that may be used for query expansion purposes. The specialist domain context imposes certain constraints on the language used in the texts, which makes the texts computationally more tractable.
منابع مشابه
Automatic Thai Ontology Construction and Maintenance System
Ontology is an essential resource to enhance the performance of Information Processing system such as information integration, document classification in taxonomies, including information retrieval and data cleaning in database system. This paper proposes three methodologies for Automatic Thai Ontology Construction and Maintenance from technical corpus, dictionary and thesaurus. For corpus base...
متن کاملBuilding Thesaurus from Manual Sources and Automatic Scanned Texts
This paper describes the work done in the TIPS project about the construction of a thesaurus base. This construction is a merge from a thesaurus manually built and one automatically extracted from large text corpora. Several manually built thesaurus have been semiformatted to be merged in a consistent common base. The automatic extraction is based on both syntax and statistics. We present in th...
متن کاملStatistical Thesaurus Construction for a Morphologically Rich Language
Corpus-based thesaurus construction for Morphologically Rich Languages (MRL) is a complex task, due to the morphological variability of MRL. In this paper we explore alternative term representations, complemented by clustering of morphological variants. We introduce a generic algorithmic scheme for thesaurus construction in MRL, and demonstrate the empirical benefit of our methodology for a Heb...
متن کاملEvaluation of a Thesaurus-Based Query Expansion Technique
based query expansion method for information retrieval. The query expansion process assigns weights to different types of relations obtained from vocabulary structures, providing an efficient way to measure distances between different terms. This method was applied to a Portuguese juridical corpus and evaluated over the top-27 queries used in the web site of the Portuguese Attorney General's Of...
متن کاملToward a Pan-Chinese Thesaurus
In this paper, we propose a corpus-based approach to the construction of a Pan-Chinese lexical resource, starting out with the aim to enrich existing Chinese thesauri in the Pan-Chinese context. The resulting thesaurus is thus expected to contain not only the core senses and usages of Chinese lexical items but also usages specific to individual Chinese speech communities. We introduce the ratio...
متن کامل